Article 1319

Title of the article

A NOISE-ROBUST ALGORITHM TO DETERMINE PROSODIC CHARACTERISTICS OF SPEECH SIGNALS FOR SYSTEMS OF HUMAN PSYCHO-EMOTIONAL STATE ASSESSMENT 

Authors

Alimuradov Alan Kazanferovich, Candidate of engineering sciences, director of student research and production business incubator, Penza State University (40 Krasnaya street, Penza, Russia), E-mail: alansapfir@yandex.ru
Tychkov Aleksandr Yur'evich, Candidate of engineering sciences, deputy director of the Research Institute for Basic and Applied Studies, Penza State University (40 Krasnaya street, Penza, Russia), E-mail: tychkov-a@mail.ru
Churakov Petr Pavlovich, Doctor of engineering sciences, professor, sub-department of information and measuring equipment and metrology, Penza State University (40 Krasnaya street, Penza, Russia), E-mail: churakov-pp@mail.ru
Artamonov Dmitriy Vladimirovich, Doctor of engineering sciences, professor, First Vice Rector of Penza State University (40 Krasnaya street, Penza, Russia), E-mail: aius@pnzgu.ru 

Index UDK

004.934 

DOI

10.21685/2072-3059-2019-3-1 

Abstract

Background. In the last decade, an automated assessment of a human psychoemotional state by speech has been actively developed. In the context of a complex “aggressive” noise environment, all speech signals are to some degree noisy. Depending on the intensity and type, noise can significantly distort the results of human psycho-emotional state assessment. The purpose of the study is to develop a noise-robust algorithm for determining the prosodic characteristics of speech signals to increase the effectiveness of human psycho-emotional state assessment.
Materials and methods. A unique technology for adaptive decomposition of non-stationary signals, namely, the improved complete ensemble empirical mode decomposition with adaptive noise, has been used in the development of the method. Software implementation of the method was performed in the mathematical modeling environment © Matlab (MathWorks).
Results. A noise-robust algorithm for determining the prosodic characteristics of speech has been developed. It consists in decomposing a speech signal into informative noise and signal frequency components using the improved complete ensemble empirical mode decomposition with adaptive noise, and selecting the component containing the pitch with the subsequent determination of prosodic characteristics. A study was conducted using a verified base of pure and noisy speech signals recorded from 220 males and females, aged 18 to 79 years with signs of psycho-emotional disorders.
Conclusions. In accordance with the results of study, it was revealed that the proposed algorithm provides robustness to noise of various intensities (signal-to-noise ratio is from 0 to 30 dB), and can be tested under real conditions of “aggressive” noise environment in assessment systems of a human psycho-emotional state. 

Key words

speech signal, noise robustness, empirical mode decomposition, prosodic characteristics, psycho-emotional state 

Download PDF
References

1. Официальный сайт компании «WEVOSYS». – URL: http://www.wevosys.com (дата обращения: 20.06.2019).
2. Официальный сайт компании «NEMESYSCO». – URL: http://www.nemesysco.com (дата обращения: 20.06.2019).
3. Boll, S. Suppression of acoustic noise in speech using spectral subtraction / S. Boll // IEEE Transactions on Acoustics, Speech, and Signal Processing. – 1979. – Vol. 27 (2). – P. 113–120.
4. Berstein, A. A hypothesized Wiener filtering approach to noisy speech recognition / A. Berstein, I. Shallom // 1991 International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (Toronto, Canada, 14–17 May 1991). – Toronto, Canada : IEEE, 2018. – P. 913–916.
5. Furui, S. Cepstral analysis technique for automatic speaker verification / S. Furui // IEEE Transactions on Acoustics, Speech, and Signal Processing. – 1981. – Vol. 29(2). – P. 254–272.
6. Viikki, O. A recursive feature vector normalization approach for robust speech recognition in noise / O. Viikki, D. Bye, K. Laurila // 1998 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '98) (Seattle, USA, 14–15 May). – Seattle, USA : IEEE, 1998. – P. 733–736.
7. Alimuradov, A. K. Noise-robust speech signals processing for the voice control system based on the complementary ensemble empirical mode decomposition / A. K. Alimuradov, P. P. Churakov // 2015 International Siberian Conference on Control and Communications (SIBCON) (Omsk, Russia, 21–23 May). – Omsk, Russia : IEEE, 2015. – 6 p.
8. Alimuradov, A. K. A method to determine cepstral markers of speech signals under psychogenic disorders / A. K. Alimuradov, A. Yu. Tychkov, A. V. Ageykin, P. P. Churakov // 2018 Ural Symposium on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT) (Yekaterinburg, Russia, 7–8 May 2018). – Yekaterinburg, Russia : IEEE, 2018. – P. 128–131.
9. Alimuradov, A. K. Automation of empirical mode decomposition to increase efficiency of speech signal processing / A. K. Alimuradov, A. Yu. Tychkov, Yu. S. Kvitka // 2018 International Russian Automation Conference (RusAutoCon) (Sochi, Russia, 9–16 Sept. 2018). – Sochi, Russia : IEEE, 2018. – 6 p.
10. Improved CEEMDAN based speech signal analysis algorithm for mental disorders diagnostic system. Pitch frequency detection and measurement / A. K. Alimuradov, A. Yu. Tychkov, A. V. Kuzmin, P. P. Churakov, A. V. Ageykin, G. V. Vishnevskaya // International Journal of Embedded and Real-Time Communication Systems (IJERTCS). – 2019. – Vol. 10, № 2. – P. 22–47.
11. Huang, N. E. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis / N. E. Huang, Sh. Zheng, R. L. Steven // Proceedings of the Royal Society of London. – 1998. – Vol. A 454. – P. 903–995.
12. Stassi, A. E. Vocal fold activity detection from speech related biomedical signals: a preliminary study / A. E. Stassi, G. A. Alzamendi, G. Schlotthauer, M. E. Torres // VI Latin American Congress on Biomedical Engineering CLAIB 2014 (Parana, Argentina, 29– 31 October 2014) / A. Braidot, A. Hadad (eds). – IFMBE Proceedings. – Cham : Springer, 2014. – Vol. 49. – P. 520–523.
13. Torres, M. E. Empirical mode decomposition. Spectral properties in normal and pathological voices / M. E. Torres, G. Schlotthauer, H. L. Rufiner, M. C. Jackson-Menaldi // 4th European Conference of the International Federation for Medical and Biological Engineering, ECIFMBE (Antwerp, Belgium, 23–27 November 2008). – Antwerp, Belgium : Springer, 2008. – P. 252–255.
14. Schuller, B. W. Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing / B. W. Schuller, A. M. Batliner. – New York : Wiley, 2013. – P. 344.
15. Фант, Г. К. Акустическая теория речеобразования / Г.К. Фант ; пер. с англ. Л. А. Варшавского и В. И. Медведева ; науч. ред. В.С. Григорьева. – Москва : Наука, 1964. – 284 с.
16. Colominasa, M. A. Improved complete ensemble EMD: a suitable tool for biomedical signal processing / M. A. Colominasa, G. Schlotthauera, M. E. Torres // Biomed. Signal Proces. – 2014. – Vol. 14. – P. 19–29.

 

Дата создания: 14.01.2020 10:06
Дата обновления: 22.01.2020 13:34